Parallelization techniques for accelerating PageRank computation
نویسندگان
چکیده
PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. Let G = [gij ]i,j=1 be a Web graph adjacency matrix with elements gij = 1 when there is a link from page j to page i, with i 6= j, and zero otherwise. From this matrix we can construct a transition matrix P = [pij ] n i,j=1 as follows: pij = gij cj if cj 6= 0 and 0 otherwise, where cj = ∑n i=1 gij , 1 ≤ j ≤ n, represents the number of out-links. For pages with a nonzero number of out-links the matrix P is column stochastic. In this case, the PageRank vector can be obtained by solving Pπ = π. The Power method is one of the oldest and simplest iterative methods for solving this eigenvector problem. When the matrix P ≥ O is irreducible and stochastic, the Power method converges to the eigenvector corresponding to λmax = 1. However, the Web contains many pages without out-links. In this case, the matrix P is non-stochastic and the Power method can not be used. Moreover, the matrix irreducibility is not satisfied for a Web graph. In order to overcome these difficulties, Page and Brin [2] change the transition matrix P to a column stochastic matrix P̄ = α(P + vd ) + (1− α)ve , where d ∈ < is defined by di = 1 if and only if ci = 0 and the vector v ∈ < is some probability distribution over pages. Then, setting α such that 0 < α < 1, the Power method can be used to solve the stationary distribution of the ergodic Markov chain defined by P̄ π = π.
منابع مشابه
Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation
The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph par...
متن کاملA Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation
A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the...
متن کاملEfficient Computation of PageRank
This paper discusses efficient techniques for computing PageRank, a ranking metric for hypertext documents. We show that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory. Running-time measurements on various memory configurations are presented for PageRank computation over the 24-million-page Stanford WebBase...
متن کاملExperiments with PageRank Computation
PageRank algorithm is one of the most commonly used algorithms that determine the global importance of web pages. Due to the size of web graph which contains billions of nodes, computing a PageRank vector is very computational intensive and it may takes any time between months to hours depending on the efficiency of the algorithm. This promoted many researchers to propose techniques to enhance ...
متن کاملFast PageRank Computation Via a Sparse Linear System (Extended Abstract)
The research community has devoted an increased attention to reduce the computation time needed by Web ranking algorithms. Many efforts have been devoted to improve PageRank [4, 23], the well known ranking algorithm used by Google. The core of PageRank exploits an iterative weight assignment of ranks to the Web pages, until a fixed point is reached. This fixed point turns out to be the (dominan...
متن کامل